chatgpt

A random variable \(Y\) is a function that maps outcomes from the sample space
\(S\) of a random experiment to real numbers.

  • If \(Y\) only takes integer values, it is called a discrete random variable.

  • The probability mass function (PMF) is denoted by \(P(Y = x),\ x \in \mathbb{N}\).

  • The cumulative distribution function (CDF) is denoted by \(P(Y \leq x),\ x \in \mathbb{N}\).

  • The expectation and variance of \(Y\) are denoted by \(E(Y)\) and \(\text{Var}(Y)\), respectively.

Statistic vs. Estimate

A random sample of size \(n\) from \(Y\) is denoted as:

\[ Y_1, Y_2, \dots, Y_n. \]

  • A statistic is any function of the sample, such as \(\bar{Y}_n\) which estimates the population mean \(E(Y)\).

  • An estimate is a statistic used to infer the value of an unknown parameter.

Example

  • Example 2.1-3, Hogg, Tanis and Zimmerman, 9th Edition, p.44:
    Roll a fair four-sided die twice and let \(X\) be the maximum of the two outcomes.

1 基於公正骰子的假設

  • \(P(X = k) = \frac{2k - 1}{16}, \quad k = 1, 2, 3, 4.\)

  • 期望值 (Expectation): \(E(X) = \frac{25}{8} \approx 3.125.\)

  • 變異數 (Variance): \(\text{Var}(X) = \frac{55}{64} \approx 0.8594,\ \text{SD}(X) \approx 0.927.\)


2 樣本觀測與統計量

從此分佈抽取 \(n\) 個樣本 \(x_1, x_2, \dots, x_n\),可以計算以下 7 個統計量

  • 機率估計\(\hat{P}(X = k) = \frac{1}{n} \sum_{i=1}^{n} I_{\{x_i = k\}}, \quad k = 1,2,3,4.\)

  • 樣本平均 (Sample Mean)\(\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i.\)

  • 樣本變異數 (Sample Variance)\(s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2.\)

  • 卡方統計量 (Chi-Square Statistic):(綜合衡量機率估計與預期機率的差距) \[ \chi^2 = \sum_{k=1}^{4} \frac{(o_k - E_k)^2}{E_k}, \]

    • \(o_k\) 是第 \(k\) 種結果的觀測次數 \(n \hat{P}(X = k)\),

    • \(E_k\) 是第 \(k\) 種結果的期望次數 \(n P(X = k)\).

  • 先撰寫一個 視覺化版本(非互動式),觀察樣本數變化時, 模擬統計量(如平均、標準差)與理論值之間的 誤差趨勢。

  • 將上述內容擴充為 Shiny App 的互動版本,讓使用者可以自由調整樣本數骰子權重, 並即時觀察絕對差的變化。


3 統計量也是隨機變數

  • \(Y_1, \cdots, Y_n\) is a random sample of \(Y\). Hence, \(h(Y_1, \cdots, Y_n)\) is a random variable,
    where \(h\) is a function that maps the sample to a real number.

    • \(\hat{P}(X = k) = \frac{1}{n} \sum_{i=1}^{n} I_{\{X_i = k\}}, \quad k = 1, 2, 3, 4.\)

    • \(\bar{X} = \frac{1}{n} \sum_{i=1}^{n} X_i.\)

    • \(S^2 = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2.\)

  • Usually, it is not easy to derive the distribution of \(h(Y_1, \cdots, Y_n)\), even if we know the distribution of \(Y\).

  • If it is not feasible to derive the distribution analytically, we can approximate it by repeating the sampling process \(n_rep\) times, namely generating \(n_rep\) values of \(h(Y_1, \cdots, Y_n)\).

3.1 觀察樣本數變化時的機率估計統計量

  • For a continuous statistic \(h(Y_1, \cdots, Y_n)\), histograms and boxplots serve as useful tools for exploring its distributional properties and potential patterns.

以下是在\(n = 50, 100, 500, 1000\) 時,生成上述4個機率估計、樣本平均、樣本變異數、樣本標準差,共 \(n\_rep = 1000\) 筆。每一筆或每一列的值,皆來自相同模擬數據所產生。

## 'data.frame':    4000 obs. of  8 variables:
##  $ n       : num  50 50 50 50 50 50 50 50 50 50 ...
##  $ max_1   : num  0.04 0.08 0.06 0.06 0.08 0.1 0.1 0.08 0.04 0.06 ...
##  $ max_2   : num  0.22 0.12 0.1 0.2 0.08 0.2 0.14 0.22 0.18 0.3 ...
##  $ max_3   : num  0.32 0.4 0.34 0.36 0.28 0.18 0.34 0.22 0.32 0.28 ...
##  $ max_4   : num  0.42 0.4 0.5 0.38 0.56 0.52 0.42 0.48 0.46 0.36 ...
##  $ max_mean: num  3.12 3.12 3.28 3.06 3.32 3.12 3.08 3.1 3.2 2.94 ...
##  $ max_var : num  0.802 0.842 0.777 0.833 0.875 ...
##  $ max_sd  : num  0.895 0.918 0.882 0.913 0.935 ...

  • Weak Law of Large Numbers (WWLN) Let \(Y_1, Y_2, \ldots\) be i.i.d. random variables with \(E(Y) < \infty\).
    Then for any \(\varepsilon > 0\), \[ \lim_{n \to \infty} P\left( \left| \bar{Y}_n - E(Y) \right| \geq \varepsilon \right) = 0, \] denoted by \[ \bar{Y}_n \xrightarrow{p} E(Y). \]

  • Properties of the Sample Mean \(E(\bar{Y}_n) = E(Y)\) (unbiased estimator) and \(\text{Var}(\bar{Y}_n) = \frac{1}{n}\text{Var}(Y)\).

    • \(\hat{p}_k = \frac{1}{n} \sum_{i=1}^{n} I_{\{X_i = k\}}\xrightarrow{p} P(X = k).\) (for \(k = 1, 2, 3, 4\))

    • \(\text{Var}(\hat{p}_k) = \frac{1}{n} p_k (1 - p_k)\).

  • Central Limit Theorem (CLT):
    Let \(Y_1, Y_2, \cdots\) be i.i.d. random variables with \(E(Y) < \infty\) and \(\text{Var}(Y) < \infty\). Then, for all \(z \in \mathbb{R}\), \[ \lim_{n \rightarrow \infty} P\left( \frac{\sqrt{n}(\bar{Y}_n - E(Y))}{\sqrt{\text{Var}(Y)}} \leq z \right) = \Phi(z), \] which is is often denoted as: \[ \frac{\sqrt{n}(\bar{Y}_n - E(Y))}{\sqrt{\text{Var}(Y)}} \xrightarrow{d} N(0, 1). \]

  • Example: By the CLT, we have: \[ \frac{\sqrt{n}(\hat{p}_k - p_k)}{\sqrt{p_k(1 - p_k)}} \xrightarrow{d} N\left( 0, 1\right). \tag{1} \]

    • That is, for large \(n\), \[ \hat{p}_k \approx N\left( p_k, \, \frac{p_k(1 - p_k)}{n} \right) \]

    • \(n \hat{p}_k = \sum_{i=1}^{n} I_{\{X_i = k\}} \sim \text{Binomial}(n, p_k).\)

    • Moreover, \(\widehat{\text{Var}}(I_{\{X_i = k\}}) = \hat{p}_k (1 - \hat{p}_k) \xrightarrow{p} p_k (1 - p_k),\) by Convergence of random variables. Then using Slutsky’s theorem, this together with \[ \sqrt{n}(\hat{p}_k - p_k) \xrightarrow{d} N(0, p_k(1 - p_k)) \] yields \[ \frac{\sqrt{n}( \hat{p}_k - p_k )}{ \sqrt{ \hat{p}_k (1 - \hat{p}_k) } } \xrightarrow{d} N(0, 1). \tag{2} \]

    • The sample variance based on \(I_{\{X_i = k\}}, \ i = 1, \ldots, n\), is an unbiased estimator of the true variance \(\text{Var}(I_{\{X_i = k\}}) = p_k(1 - p_k).\) In contrast, \(\widehat{\text{Var}}(I_{\{X_i = k\}}) = \hat{p}_k (1 - \hat{p}_k)\) is a biased estimator, but it is consistent. Similarly, following the same reasoning as in Equation (2), we can derive an alternative version of the distributional convergence by substituting the population variance with the sample variance.

## 
## ### Shapiro-Wilk p-values for max_1 
## 
## # A tibble: 4 × 3
##   n     `Z: Plug-in` `Z: Theoretical`
##   <fct> <chr>        <chr>           
## 1 50    0.0000*      0.0000*         
## 2 100   0.0000*      0.0000*         
## 3 500   0.0000*      0.0053*         
## 4 1000  0.0050*      0.0006*         
## 
## ### Shapiro-Wilk p-values for max_2 
## 
## # A tibble: 4 × 3
##   n     `Z: Plug-in` `Z: Theoretical`
##   <fct> <chr>        <chr>           
## 1 50    0.0000*      0.0000*         
## 2 100   0.0000*      0.0000*         
## 3 500   0.0073*      0.1608          
## 4 1000  0.0075*      0.2514          
## 
## ### Shapiro-Wilk p-values for max_3 
## 
## # A tibble: 4 × 3
##   n     `Z: Plug-in` `Z: Theoretical`
##   <fct> <chr>        <chr>           
## 1 50    0.0000*      0.0000*         
## 2 100   0.0000*      0.0015*         
## 3 500   0.6937       0.3390          
## 4 1000  0.0115*      0.0882          
## 
## ### Shapiro-Wilk p-values for max_4 
## 
## # A tibble: 4 × 3
##   n     `Z: Plug-in` `Z: Theoretical`
##   <fct> <chr>        <chr>           
## 1 50    0.0000*      0.0000*         
## 2 100   0.0003*      0.0006*         
## 3 500   0.1226       0.1109          
## 4 1000  0.1553       0.1235

3.2 聯合分佈、相關係數

  • Marginal Distribution

  • Order Statistics

## The covariance matrix:
##           max_1     max_2     max_3     max_4
## max_1  0.000116 -0.000024 -0.000032 -0.000060
## max_2 -0.000024  0.000314 -0.000128 -0.000162
## max_3 -0.000032 -0.000128  0.000440 -0.000280
## max_4 -0.000060 -0.000162 -0.000280  0.000503

3.3 樣本平均

  • Sample Mean: \(\bar{X}_n \approx N\left( \frac{25}{8}, \, \frac{55}{64n} \right)\)

## # A tibble: 4 × 2
##       n p_value
##   <dbl> <chr>  
## 1    50 0.0001*
## 2   100 0.0005*
## 3   500 0.3353 
## 4  1000 0.2094
  • Weak Law of Large Numbers (WWLN):

  • Central Limit Theorem (CLT) \(E(X)=\frac{25}{8}\) and \(Var(X)=\frac{55}{64}\)

    • Using the true variance: \(\frac{\sqrt{n}(\bar{X}_n - E(X))}{\sqrt{\text{Var}(X)}} \xrightarrow{d} N(0, 1)\)

    • Using the sample variance: \(\frac{\sqrt{n}(\bar{X}_n - E(X))}{\sqrt{S^2_n}} \xrightarrow{d} N(0, 1)\)

3.3 樣本變異數

  • Sample Variance:
    \[ S_n^2 = \frac{1}{n - 1} \sum_{i = 1}^{n} (Y_i - \bar{Y}_n)^2 \]

    • \(E(S_n^2) = \sigma^2\):an unbiased estimator of the population variance

    • \(\text{Var}(S_n^2) = \frac{1}{n} (\kappa - 1 + \frac{2}{n-1}) \sigma^4\), where \(\kappa\) is the kurtosis of the population distribution.

    • \(S_n^2\) is a consistent estimator of the population variance \(\sigma^2\); that is, \(S_n^2 \xrightarrow{p} \sigma^2\) as \(n \to \infty\).

    • \(\sqrt{n}(S_n^2 - \sigma^2) \xrightarrow{d} N\left( 0, (\kappa - 1) \sigma^4 \right)\) (試著證看看,亦可用模擬驗證)

      • This together with \(\sqrt{(\kappa - 1)} S_n^2 \xrightarrow{p} (\kappa - 1) \sigma^2\) yields \[ \frac{\sqrt{n}(S_n^2 - \sigma^2)}{\sqrt{\kappa - 1} S_n^2} \xrightarrow{d} N\left( 0, 1 \right) . \]
    • For the normal distribution, we have:

      • Kurtosis \(\kappa = 3\), \(\text{Var}(S_n^2) = \frac{2 \sigma^4}{n - 1}\) and \(\frac{(n - 1) S_n^2}{\sigma^2} \sim \chi^2_{n - 1}\).
  • A biased sample variance: \[ {S'}_n^2 = \frac{1}{n} \sum_{i = 1}^{n} (Y_i - \bar{Y}_n)^2 \]

    • \(E({S'}_n^2) = \frac{n - 1}{n} \sigma^2 < \sigma^2\)

    • This estimator is biased, but consistent as \(n \to \infty\).

  • Sample Standard Deviation: \(S_n = \sqrt{S_n^2}\)

    • \(E(S_n) < \sigma\):a biased estimator of \(\sigma\)

    • For the normal distribution, an unbiased estimator of \(\sigma\) is given by: \[ \hat{\sigma} = \frac{S_n}{c_4(n)} \quad \text{where } c_4(n) = \sqrt{\frac{2}{n-1}} \cdot \frac{\Gamma(n/2)}{\Gamma((n-1)/2)} \]

    • See Wikipedia: Unbiased estimation of standard deviation for details.

3.4 樣本平均與樣本變異數不獨立

問題思考: 若骰子不公平,那麼:

  • \(P(X = k)\) 會是什麼樣的分佈呢?期望值、變異數又會是什麼樣的數值呢?

  • 除了手算外,你還可以怎麼算?